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Abstract 

We present an (l+e)-approximation algorithm for computing the minimum-spanning 
tree of points in a planar arrangement of lines, where the metric is the number of cross- 
ings between the spanning tree and the lines. The expected running time of the algo- 
rithm is near linear. We also show how to embed such a crossing metric of hyperplanes 
in d-dimensions, in subquadratic time, into high-dimensions so that the distances are 
preserved. As a result, we can deploy a large collection of subquadratic approximations 
algorithms IM98| , GIV01| for problems involving points with the crossing metric as a 



distance function. Applications include MST, matching, clustering, nearest-neighbor, 
and furthest-neighbor. 

o 

CO 

o 

g : 1 Introduction 

Given a set of lines in the plane a natural measure of distances between any two points 



is the number of lines one has to cross to reach from one point to the other. This is a 
discrete distance measure that can be used to approximate the Euclidean distance and other 



distance measures. However, since this measure is denned by an arrangement of lines it is not 
locally denned and is thus computationally cumbersome. Finding the minimum spanning 
tree (MST) of a set of points, so that the number of intersections between the tree and the 
given set of lines is minimized, quantify how the set of points interact with the set of lines; 
see Figure p]. In fact, when the set of lines is the set of all possible lines, then this MST 
is the standard Euclidean MST [ |AF97|| (here one minimizes the average number of edges of 
the MST crossed when picking a random line). Such an MST is related to a spanning tree 
of low stabbing number (STLSN) ||Wel92| , |Aga91|| . While the spanning tree of low stabbing 



number guarantee that any line intersects at most 0(y/n) edges of the spanning tree, the 
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Figure 1: A set of lines and points, and the resulting crossing MST. Note that in this case 
the crossing MST is different from the Euclidean MST. 



MST guarantees that the overall number of intersections between the tree and a given set 
of lines is minimized. Thus, if we have the set of lines in advance then the MST will have 
overall less intersections than the STLSN. The spanning tree of low-stabbing number was 
used in several applications, see for example [|Aga91| , |MWW9lf . In particular, having such 



an MST enables one: (i) to answer half-plane range queries in an efficient manner using a 
near linear space ||GHS91|| , (ii) bound the complexity of the faces of the arrangement of lines 



that contain the points [ |HS01|| , and (iii) traverse between the points in an efficient way, so 
that the number of updates needed is minimized. (Imagine traversing among the points and 
maintaining the set of half-planes that contain the current point. Each time one crosses a 
line an update operation is performed.) 

Computing the MST for the general case of arcs can be done in 0(n 2 logn) time by per- 
forming wavefront propagation from each of the points (see Section 0). As for approximation 



algorithms, Har-Peled and Sharir [|HS01|| gave recently an approximation algorithm for the 
case of arcs, computing a Steiner tree in expected running time w = 0(At + 2(n+ W op t) log(n)), 
where t is the maximum number of intersections between a pair of arcs, Af+2( - ) is the max- 
imum length of a Davenport-Schinzel sequence of order t, and W op t is the weight of the 
optimal Steiner tree. 1 The algorithm outputs a tree of weight w (and thus gives roughly 
O (log n)-approximation) . 

In this paper, we present two results: 

• A near linear time (1 + ^-approximation algorithm to the minimum-spanning tree 
under the crossing metric in the planar case. 

• We show to embed the crossing metric among hyperplanes into a Hamming distance 
in high dimensions. As a result, we show how one can apply known subquadratic 
approximation algorithms for problems involving point-sets and hyperplanes in high 
dimensions (MST, clustering, matching, etc). 



x It is easy to verify that if we have triangle inequality then the Steiner tree weight is at least half the 
weight of the MST. 



The connection between the crossing metric, and points in high dimension follows by 



interpreting the input points as points in abstract VC-space ||PA95|| induced by the lines. 
Namely, we associate with each point in the plane, an n-dimensional binary vector, 
where z-th coordinate indicate on which side of the z'-th line the point lies. In this way, 
we mapped our input points into points lying on the n-dimensional hypercube. The 
crossing metric is no more than the Hamming distance between the mapped points. 
We can now deploy the techniques of [ |1M9&| ] to those mapped points, yielding an 
approximation algorithm for the MST problem. Bringing down the running time to be 
subquadratic requires some additional work. 

Specifically, we show how to compute a mapping of the points into space of dimension 
0(log 7 n); this embedding can be computed in 0(n 4//3 ) time 2 , for n points, so that we 
get a (1 + e) gap property for a specified range of distances is preserved. 

As a result, we can solve several approximation problems for this metric, among them 
is the MST problem. In fact, our near-linear approximate MST algorithm in the plane 
can be roughly viewed as an unraveling of the corresponding MST approximation 
algorithm in high dimensions. Similar bounds can be derived for d > 2 dimensions. 
See Section || for details. 

The paper is organized as follows: In Section |2|, we describe how one can compute 
the exact MST using wavefront propagation. In Section |3], we present the planar (1 + e)- 
approximation algorithm for the MST. Next, in Section we describe the embedding into 
points in high dimension and demonstrate its usage for computing an approximate MST. 
Concluding remarks are given in Section 



2 Minimum Spanning Tree by Continuous Dijkstra 

In this section, we present a simple algorithm for computing the crossing MST. It relies on 
a simple direct solution interpreted as a geometric algorithm. We also present a "weight 
sensitive" algorithm (Lemma |2.7|) that computes portions of the MST in time proportional 
to its overall weight. 

In the following, we assume that we are given a set L of lines and a set P of points in 
the plane. For simplicity, we assume \P\ = \L\ = n. 

Definition 2.1 For a set L of lines, the crossing metric is defined to be the minimum number 
of lines of L that one has to cross as one moves between two prespecified points. Thus, for 
a pair of points p, q G 1R 2 the crossing distance between p and q, denoted by T>l(p, q), is the 
number of lines of L that intersects the segment pq. If L is a set of arcs, a similar crossing 
metric is defined, although the "shortest path" in this case is no longer necessarily a straight 
segment. 

Definition 2.2 For a set L of lines, and a set P of points in the plane, let T opt (P, L) denote 
a minimum spanning tree of P under the crossing metric induced by L, and let W op t(P, L) 
denote the weight of T opt (P, L). 

2 Here and in the rest of this paper f(n) — 0(g(n)) iff f(n) = g{n){l/e)°^ is> log *- 1 -* n, and f(n) = O e (g(n)) 
iff/(n) = 0(ff(n)/e°«) 
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Figure 2: Computing the crossing MST by doing wavefront propagation. The thick lines 
denote the boundary of the current connected components of the spanning forest. 
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Let A = A(L) denote the planar arrangement induced by the lines of L. Let G a ^ = 
G a dj(A) be the adjacency graph of A', namely, each face of A is a vertex, and two vertices 
are connected if the two corresponding faces share an edge. Let V be the set of vertices of 
G a dj that corresponds to the faces of A that contains points of P. Clearly, the crossing MST 
of P in A, corresponds to the MST of V in the graph G a dj (here, each edge has associated 
weight 1). 

Computing the MST of V in G a dj can be done by performing a simultaneous flooding of 
G a dj from the vertices of V. Indeed, we compute in the i-th iteration all the vertices of G a dj 
that are in distance < % from any vertex of V. This can be easily done using a modified BFS. 
In the beginning, the flood front is made out of n connected components. Every time two 
connected components of the flood front collide, we discovered a new edge of the MST. This 
edge connects the two vertices that induced the two parts of the wavefront that collided. 
This is a somewhat non-standard algorithm for computing the MST, but one can easily 
verify that it indeed computes the MST of V in G a dj- 

This flooding algorithm has a natural geometric interpretation: Let Tu denote the set 
of all faces of A that are in (crossing) distance at most % from any point of P. Clearly, JF 
is the set of faces of A that contain points of P. The algorithm works in n/2 phases. We 
do a wavefront propagation in G a dj, starting from all the vertices that correspond to the 
marked faces (i.e., faces of A that contain points of P). In each iteration, we propagate 
the wavefront from the faces of Tu-i into the faces of jF 2i . It is easy to verify that a 
connected component of the flood corresponds to a connected component of the wavefront 
of Tn. (Note, that two faces of Ti% might be adjacent but belong to different wavefronts 
as the wavefronts did not cross the separating edge yet and thus were not merged into a 
single wavefront.) The connected components are maintained implicitly by a union-find 
data-structure. In particular, during the i-th iteration of the wavefront propagation in G a dj, 
when two different connected components of the wavefront collide, it corresponds to two 
points of P with crossing distance equal to 2i — 1 or 2i from each other. 

In particular, if there is an edge of the MST of weight 2i — 1 or 2i it would be discovered 
when the corresponding wavefronts collide. The i-th iteration of the wavefront propagation, 
corresponds to the detection of edges of weight 2i — 1 and 2i in the MST. For the MST 
applications, we first handle all relevant edges of weight 2i — 1, and later all such edges of 
weight 2i. This requires a somewhat careful implementation, and we omit the the technical 
but straightforward details. See Figure |j. 

Note, that the wavefront propagation can be done without constructing G a dj in advance, 
and one can compute parts of G a dj on the fly as needed (i.e., we need to compute only the 
parts of G a dj that are covered by the wavefront, or are about to be covered). Of course, in 
the worst case, the whole graph G a dj would be computed, which takes 0{n 2 logra) time (this 
corresponds to computing the whole arrangement A(L))- 

Lemma 2.3 Given a set L of n lines, and a set P of n points, a minimum spanning tree 
T opt (P,L) of P under the crossing metric T>l can be computed in 0(n 2 logn) time. 



Remark 2.4 In the algorithm of Lemma |2.3| we did not use the fact that L is a set of lines. 



The same algorithm will work for the case where L is a set of arcs. Since we do not have the 
triangle inequality in this case, the edges of the MST are no longer line segments, but rather 
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a Jordan arcs. (For example, imagine that the set L is a single segment and we would like to 
connect two points that are separated by this segment. This can be done with no crossing 
by going "around" this segment.) 

To be able to generate parts of G a dj incrementally, as we perform the wavefront propa- 
gation, we need a way to compute the relevant portions of A{L) on the fly. 



Theorem 2.5 ( [[HS01|| ) Let L be a set of n lines, as above, and P a set of m points in the 



plane. Then one can compute, in expected 0([n + w + m) a(n) logn) time, a Steiner tree M. 
of P, so that the expected weight of Ai is 0((n + w)a(n) logn), where w = W op t{P, L) and 
a(n) is the inverse of the Ackermann function. Alternatively, one can compute the m faces 
that contain the points of P in the same time bound. 

Lemma 2.6 ( f|Ag;a91|1 ) Th ere exists a Steiner tree M.' of P, so that W op t(P, L) = 0(ny/n), 
and this is tight in the worst case (even for the case the arcs are lines). 



In the worst case, Theorem |27| is inferior to implicit point-location data-structures 
A1V1S98| | (which can perform the implicit point-location needed in roughly 0(n 4//3 ) time 



for m = n), as implied by Lemma (as the weight of the MST is f2(n 3//2 ) in the worst case, 
and this is the time to compute the relevant portions of the arrangement using the algorithm 



of Theorem f2.5| ). However, the running time of the algorithm of Theorem |2.5| is sensitive to 
the overall weight of the MST. This would be crucial for our algorithm. 

Lemma 2.7 Given a set L of n lines, a set P of n points, and a parameter i, one can 
compute, in expected 0(i(n + W op t)c( 2 (n) logn) time, a minimum spanning forest of P under 
the crossing metric T>l, that connects all the points of P in distance at most < 2i from each 
other, where W opt = W op t(P,L). 

Proof: The wavefront propagation on G a( ij can be done using an implicit representation 
of the arrangement of A(L). Namely, we compute the set T of faces of A(L) in distance i 
from the points of P. Observe that the complexity of T is 0((n + W opt )ia(n/i)). Indeed, 
the points of P can be connected by an arc 7 = T opt (P, L) having 0(W opt ) intersections with 
the lines of L, and let A' be the arrangement resulting from A by creating a tiny gate for 
each intersection of 7 with the lines of L. The zone of 7 in A{L) corresponds to a single 
face F of A', arid the faces of JFj are contained in the set of faces in distance < i from F. 
By [ |1BDS95[ , the complexity of this region is 0((n + W op t)ict{n/i)) (this is a bound on the 
complexity of all the vertices in distance < % from the face F.). 

Clearly, the faces of Ti have a spanning tree of weight 0((n + W op t)iac(n/i)), and so it can 
be computed in an online fashion in 0((n+ W op t)ia; 2 (rz) logn) expected time, by Theorem 



3 Approximation Algorithm for the Planar Case 

The algorithm is depicted in Figure Figure |3] and Figure We next describe the algorithm 
and its analysis in more detail. 



6 



Algorithm ApproxMST(P, l, e) 

Input : A set of points P, a set of lines L, and an approximation parameter e 
Output : A spanning tree of P of weight < (1 + e)W opi (P, L) 
begin 



M <— Approximate the weight of MST using the algorithm of Lemma Lemma [A. 6 
l <— maxf i , 

l C5iia(ri) log n ' 

Set F = (P, 0) to be the an empty spanning forest of P. 
PropagateApproxWavef ront( P, L, I, F ) 
i <- 1 

while F is not a single connected component do 

l{ < l>£ — \ ' 2 

PropagateApproxWavef ront( P, L, I, F ) 
i <- i + 1 
end while 



return F 
end ApproxMST 

Figure 3: Approximating the MST in the Plane 



Lemma |2]7] provides us with an algorithm for approximating the MST in roughly quadratic 
time in the worst case. To get a near linear running time, we simulate the Dijkstra algorithm 
by performing the wavefront propagation in an approximate fashion. 

Definition 3.1 A metric T>' e- approximates a metric T>, if for any p,q,r,s G P such that 
W(p, q) < V(r, s) then V(p, q) < (1 + e)D(r, s). 

Definition 3.2 For a set F of segments in the plane, and a metric T>, let weighty (F) = 
^ eeF P(e) denote the total weight of F under the metric D. 

The proof of the following lemma is straightforward, and is included only for the sake of 
completeness. 

Lemma 3.3 Let the metric D' be an e- approximation to the metric T> over a point-set P. 
Let T' be an MST of P under V . Then, weighty, (T') < (1 + e)weight X) (T), where T is the 
MST of P under T>, and weight (T) is the total weight of the edges ofT. 

Proof: Let e' 1; . . . , e^_ x be the the edges of T' sorted by their weight T>'(e[) < . . . < 
T>'{e! n _^). Let T = T, and let Tj be the tree resulting from removing the heaviest edge 
(according to T>') from the cycle present in Tj_i U {e^} (if e\ is already in Tj_i we do nothing). 
Let Ci denote this removed edge. Clearly, T>'( e i) — ^'^i) an d, by definition, P(e-) < 
(1 + e)V(ei). Namely, we replaced an edge by an edge e- which is heavier by a factor of 
(1+e). In the end of the process T n _i is just T', and weight I ,(T / ) < ^" = r i 1 (l+£)weight- D (ej) < 
(l + e)wdghtj,(T). ■ 

Lemma |3.3| suggest that if we can find a computationally cheaper approximate metric 
than T>l(-,-), then we can use it to compute the MST. A natural way to do that, is to 
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ALGORITHM PropagateWavef ront( P, R,l,F) 
Input : P - set of points 

R - set of lines 

I - propagation distance 

F - current spanning forest 
Output : An updated forest F with any pair of points of distance < 21 in a 

single connected component 

begin 

Initialize the data-structure D(R) of ||HS01|| for online point-location. 



Set Wo to be the set of faces of A(R) that contains points of P. 

Use D(R) to compute those faces, 
for i = 1, . . . , I do 

Wi <— Set of faces of A(R) of distance = i from points of P. 

Do wavefront propagation from Wi-x, and use D(R) to retrieve 
the faces of interest in A(R)- 
if two different wavefronts collide then 

Add an edge connecting the two corresponding points to F 
Merge the corresponding connected components, 
end for 
end PropagateWavef ront 

Figure 4: Doing the wavefront propagation 



randomly sample a subset R C L, and use T>r(-, •) as the approximate metric. However, it 
is easy to verify that T>r is an e-approximate metric to T>l only if L = R. 

Definition 3.4 Let V, T> be two metrics, e > 0, and I be prescribed parameters. The metric 
T>' is an (e, I) -approximation to "D, if for any p, q,r,s G P, such that (i) T>(p, q),V(r, s) > /, 
and (ii) V(p, q) < V'(r, s), we have T>(p, q) < (1 + e)V(r, s). 

Namely, T>' e- approximates T> for distances not smaller than /. 

Definition 3.5 For l,e, let v(l,e) = max(l28c samp ^ 1 , l) , where c samp is an appropriate 
constant. Let TZS(L, I, e) be a random subset of L generated by picking independently each 
line of L with probability i>(l,e). 

Let p(l,e) = e)l = 128c samp ^. The value p(l, e) is the expected crossing distance 
in A.(R£(L, l,e)) between two points p,q e P such that Vi(p,q) = I. 

Lemma 3.6 Let L be a set of n lines in the plane, I a positive integer number, e > 0, and 
let R = TZS(L,l,e) be a random subset of L. 

For any two points p, q of distance T>l(p, q) > / from each other we have 

n 

Vl(p, q) < -f. jt, ■ V R (p, q) < (1 + e)V L (p, q), 

r(l — e/A) 

with probability > 1 — n~ c ° . 

Furthermore, Vr(-, ■) is an (e, I) -approximation to T>l(-, ■) with high probability. 
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ALGORITHM PropagateApproxWavefront( P, L, I, F ) 
Input : P - set of points 
L - set of lines 

I - starting propagation distance 
F - current spanning forest 
Output : An updated forest F with any pair of points of distance < 21 in a 
single connected component 

begin 

Compute a random sample R by choosing each line of L into the sample with 

probability f(l) = 128c 6 ^ 
/* Approximate the wavefront propagation in A(L) by doing 

it (exactly) in A(R) */ 
PropagateWavef ront( P, R, c^logn/ 'e 2 , F ) 

/* C7 is an appropriate constant */ 
end PropagateApproxWavef ront 

Figure 5: Doing the approximate wavefront propagation 



Proof: Indeed, let X pq = D R (p,q). We have, 



\i = E[X pq ] = V L (p, q) ■ u(l, e) < 128V L (p, q)c s 



logn 128c samp logn 



By Chernoff inequality [|MR95| , |MPS98|1 , we have that 

e/4 



> 



X pq ~ W > 1^ 



< 21 



< 2 exp I /i 



l+e/4 
£ 

4 



2 exp I jj, 



4 



log 1 + 



1 + 



e\ e 



4/14 32 



< 2exp( -//— ] < exp 



128c 



samp 



log n e 2 \ 

— < n 



64 J 



since log(l + x) > x — x 2 /2, for < x < 1. In particular, this implies that with high 
probability fi(l — e/4) < X pq < + e/4). Namely, with high probability we have 



X, 



v(l,e)(l-e/4) ~ v(l,e)(l-e/4) 
< (l + e)V L (p,q). 



u(l,e)(l+e/4)^ . . l + e/4^ . . 



e/4 



Consider now four points p,q,r,s, such that T>i(p, q), T>l(s, t) > I and V R (p,q) < 
T>fj(r, s). By the above discussion, we have with high probability 

V L (p, q) ■ v(l, e)(l - e/4) < V R (p, q) < V R (r, s) < (1 + e)V L (r, s) ■ u(l, e)(l - e/4). 

Namely, T>L(p,q) < (1 + e)T>L(r, s). Namely, Vr(-, •) is an (e, Z)-approximation to Vl(-,-) 
with probability > 1 — (^n^ 3 "™?. ■ 



Lemma [T6] and Lemma |3]^ suggest that we compute the MST by computing an appro- 
priate random sample R (by using a threshold I), and deploy the algorithms of Section ||] 
to compute the MST of P in A(R)- Such an MST would be an approximate MST. There 
are two main problems with this approach: (i) For short distances (i.e., 1 = 1), just start- 
ing the wavefront propagation (i.e., Lemma |2.7| ) is prohibitively expensive (it roughly takes 
0(Wopt(P, L)) time which might be il(n 3 ^ 2 )), (ii) For long distances (i.e., >i-l), the wave- 
front propagation becomes, again, prohibitly expensive (i.e. 0(ni)) by Lemma |27^ . 



Corollary 3.7 Let U be the total weight of all the edges of T having weigh less than 
eW opt (P, L)/(10n). Then U < sW opt {P, L)/10. 

Lemma [A.6| describes how we can approximate W op t{P, L) to within a polylogarithmic 
factor using random sampling in near linear time. Since the algorithm of this lemma is very 
similar to the techniques used below, we defer its description to the appendix. Equipped 
with such approximation M, we know by Corollary [T7| that we do not "care" about edges of 
the MST of length smaller than l = 0(eM/(n polylog(n))). In particular, we can generate 
a random sample Rq which provides an (e, /(^-approximation to T>l(-,-). Thus, we can 
approximate the MST by computing the MST of T opt (P, R ). 

This, however, does not address the second problem. Indeed, computing the MST of 
T opt (P, Rq) might still be too expensive, as the following lemma testifies. 

Lemma 3.8 Given a set L of n lines, a set P of n points, and parameters l,i,e,U, such 
that I = Q(Wopt(P, L) / (nU)) and let R = HS(L,l,s) be a random sample of L. Then, one 
can compute, in expected 0(iUn) time, a minimum spanning forest of P under the crossing 
metric T>r, that connects all the points of P in distance at most < 2i from each other. 

Proof: Let X denote the size of R. Clearly, The expected value of X is 
E[X] = nv(l,e) = I2%nc samp — = ^^ ) , 



by Definition Let 7 = Zip* (7, L). Let Y = weight (7, R). Clearly, 



E[Y] = weighty, R) = W opt (P, L)v(l, e) = O . 

Namely, E[W op t(P } R)] < E[Y] = O ( Un ^° g - ) . The running time bound now follows immedi- 
ately by applying the algorithm of Lemma [2.7| to P and R. ■ 
The algorithm of Lemma |3.8| first performs wavefront propagation for distances in A{R) 
which are smaller than p(l,e). For such distances A{R) does not provide reliable estimate 
(i.e., ordering) of the crossing distances between points. However, once the distances propa- 



gated exceed p(l, e), we know by Lemma ^T6] that the distances are now (e, /)-approximated 



correctly. The main importance of the algorithm of Lemma [3.8| is that the algorithm has 
near linear running time for small values of U and i. 



Using Lemma |3.8| together with Corollary [377| implies that we can compute a spanning 



forest for the "short" edges of T opt (P, L) in near linear time. 
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Lemma 3.9 Given a set P of n points in the plane, and a set L of n lines in the plane. 
One can compute a spanning forest F of P, such that the weight of F is < eW op t(P, L)/10. 
Furthermore, every pair of points of P in distance fi(W opi (P, L)e/(nlog 3 n)) belong to the 
same connected components of F. The running time of this algorithm is 0(n). 

Proof: Using the algorithm of Lemma |A.6| , compute in 0(n) time, a number M such that 
W op t(P, L) < M = 0(na(ri) log 2 n + W op4 (P, L)a(n) logn). In particular, let 

hhort = ; — 5 — < 77— VV op i(P, L), (1) 

C\n log n 40n 

for ci large enough. On the other hand, l s hort = ^(VV op t(P, L)/(Unj), where U = 0((log 3 n)/e). 

We now compute a spanning forest for P, using Lemma with l s hort and U as specified 
and % = 2p(l,e). The running time of this algorithm is 

5{iUn) = d(p(l short ,e)n) = ■ nj = 5{n) . 

Clearly, F has at most n edges, and all the points of P in distance < l s hort are in the 
same connected component of F by Lemma |3.6| . 

Furthermore, for any edge pq of P, we have that with high probability T>l(p, q) < 2(1 + 



e)lshort < ^hhort by Lemma PT6|. In particular, weight (F,L) < Anl short < (e/10) W opt (P, L). ■ 
Lemma implies that we can compute a cheap spanning forest of P in near linear time 
that "captures" all the light edges of the MST. Next, we can compute the rest of the edges 
of the MST using Lemma |3.8| repeatedly. 

Lemma 3.10 Given a set P of n points in the plane, and a set L of n lines in the plane, a 
parameter e > 0, and a spanning forest F of P , such that every pair of points of P in distance 

< I belong to the same connected components of F, where I = n(W opt (P, L)e/(n log 3 n)). 
Then, one can compute a spanning forest F' of P such that all the points of F in distance 

< 21 belong to the same connected component of F' . The forest F' can be computed in 0(n) 
expected time. 



Proof: We use the same algorithm of Lemma |3.9|, with the modification that when calling 



to the algorithm of Lemma p.8| , we pass on P, such that the algorithm ignore generated edges 
that belong to the same connected component of P. It is again clear, that only edges of 
length between / and 2(1 + e)l would be added to the spanning forest. The exact details of 
how to specify U and i are similar to Lemma |3.9| , and are omitted. ■ 
Our algorithm for computing the MST works by using Lemma |3.9| . This results in a 
spanning forest P of the points of P, and a value I short as specified by Equation (D). We 
now use Lemma |3.1U| repeatedly O(logn) times, in the z-th iteration handling distances 



between 2 t ~ 1 l s hort to 2 • 2*/ s / lort (l + e), for % = 1, . . . , O(logn)), till we handle all distances 

< n. Namely, in the i-th iteration, we compute a spanning forest Pj of all points in distance 

< 2 % l s hort from each other using Lemma p.!0| using Pj_i as our "starting" spanning forest. 



Clearly, the expected running time of the resulting algorithm is 0{n). What is not clear, 
is that the resulting MST is indeed an e-approximate MST. 
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Lemma 3.11 With high probability, the tree T computed by the above algorithm is an e-MST 
ofP in A(L). 



Proof: All the edges generated by the algorithm of Lemma |3.9| , in the first stage of the 
algorithm, have total weight < (e / 20)W opt (P, L) with high probability. 

Let T opt (P, L) be the optimal spanning tree. If T is not an e-approximate MST, then 
weighty (T) > (1 +e) weighty (Xjpt)- in particular, there must be an edge of T opt which its 
insertion into T would results in substantially lightly spanning tree. Formally, for an edge 
e, let T(e) be the tree resulting from T by inserting e into T, and removing from T the 
heaviest (according to T>l) edge on the new cycle that was created, and let out(T, e) denote 
this "ejected" edge. 

Arguing as in the proof of Lemma [3.3| , it must be that there exists an edge <fi = pq of T opt 



such that 

(l + E)V L ((f>) <V L (put(T, </>)), 

and V L {4>) > W opt (P,L)/(20n). 

Let i be the index such that 2 i_1 / s / lori < T>l{$) < 2H s h or t- With high probability, we know 
that after the i-ih iteration p and q are in the same connected component of F^. Assume 
that p and q were not in the same connected component of Fi-i (the other case is easier and 
as such is omitted). 

Let T" be the spanning forest maintained by the algorithm just after p and q were 
present in the same connected component. With high probability, for any edge e" of T", 
we have T>L(e") < (1 + e)T>l{4>), since the random sample Ri we used in the i-iteration is 
{T^lshort, ^-approximation to T>l- 

But then, it is not possible that the algorithm added out(T, 0) to the spanning tree T", 
as all the edges on the cycle in T" U {</>} are lighter than (1 + s)Ul ((/))■ A contradiction. ■ 

We summarize our result: 



Theorem 3.12 Given a set P of n points in the plane, L a set of n lines, and e > a 
parameter. Then one can compute a spanning tree T of P, in 0(n) expected time, such that 
weight(T, L) < (1 + e)W op t(P, L). The result is correct with high probability. 



4 Approximation Algorithms for the Intersection Met- 
ric via Embeddings 

Let P = {pi, . . . ,p n } be a given set of n points, and L = {/i, . . . , l m } be a set of m lines, 
where m = n ^. As mentioned earlier, the metric T>l is computationally cumbersome. One 
possible way to overcome this problem, is to embed this metric into a more convenient metric 
(while introducing a small distortion error). 

In this section, we show a somewhat weaker result. We show how to embed the points of 
P into 0(log 7 n)-dimensional space in 0(n + m + n 2 ^m 2 ^) time, so that a specific distance 
gap in the crossing metric, is mapped to a corresponding gap in the target space. 

We first observe that the crossing distance between two points p and q, can be computed 
by interpreting this distance as a Hamming distance on the hypercube in m dimensions 
induced by the lines. Namely, each line I contribute a coordinate — a point gets a '1' in 
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this coordinate if it is on one side of I, and a '0' if it is on the other side of I. Formally, 
let / + denote the open half-plane defined by a line / that contains the origin, and l~ denote 
the other open plane. For a point p G 1R 2 , let vl(p) = • • • , b m ) be a m-bit vector so 
that hi — 1 iff p G If . It is easy to verify that T>l{p,q) = (1h(vl(p), vi(q)), where dn is the 
Hamming distance. 

Definition 4.1 Let R C L, let : R 2 — ► 7L be the mapping that maps a point p in the 
plane to its face ID in the arrangement A{R)- Formally, we assign for each face in the 
arrangement A(R) a unique integer (say, and integer between 1 and 0(\R\ 2 )). The mapping 
fa maps a point p in the plane to the integer identifying the face that contains p. (Note, 
that is does not uniquely define /#(■) as we did not specify how we assign the IDs to the 
faces.) 

For a set 7Z = (i?i, . . . , R^) of subsets of L, let fn : R 2 — > Z M be the mapping fn(p) = 
(fRi{p), fR 2 (p)i ■ ■ ■ > fn„(p))- For two points p, q G R 2 , let d H (f n (p), fn(q)) be the Hamming 
distance between f(p) and fn(<l)- Namely, this is the number of coordinates, where the two 
vectors fn(p) and fn(q) disagree. 

One can view fn as an embedding of the crossing metric T>l to the Hamming space . 

Lemma 4.2 Given a set P ofn points in the plane, a set L of lines in the plane, a parameter 
e > and a parameter r. One can compute a set 1Z of fi subsets of L, such that for the 
embedding fn : R 2 — > TL^ , we have that, with high probability, for any p,q G P it holds: 

• IfV L (p,q) < r, then d H (f(p)J(q)) < M, 

• IfV L (p,q) > (l + e)r thend H (f(p),f(q)) > (1 + e)(l - a/ log n)M, 
where M and a are appropriate constants and /i = 0(log 4 n). 

Proof: For sake of simplicity of exposition, we assume that m/r > logn, where m = \L\. 
If this is not correct, we can add "fictitious" lines to L that have all the points of P on one 
side of them. If we pick such a line to a set of 7Z, we can ignore it when we compute the face 
IDs. 

For a parameter a to be specified shortly, let k = am/r, R be a sample of k lines out of 
L (performed with replacement), and let p, q be two points of P. Let p = T>L(p,q)/n. The 
probability that p, q will be in two different faces of A{R) is 

U(p) = 1 - (1 - p) k , 

as this is the probability that not all the lines will miss the segment connecting p and q. 

Our target is to approximate the value of U (p) so we could decide whether p, q are close 
or far. Indeed, if U(p) > U((l + e)r/m) then T> L (p,q) > (1 + e)r, and if U(p) < U(r/m) 
then T> L (p, q) <r. 

To do so, we generate a set of subsets 1Z = (Ri, . . . , i? M ), by random sampling as described 
above, where \x would be specified shortly. Now we consider the quality of the distance 
approximation provided by the embedding 3 . Let X(p, q) denote the random variable which is 



3 



A similar analysis (in the context of Hamming spaces) appeared already in IndOO | ; in our case, however 



we have to put more care into the analysis, since we want e and e' to be very close. 
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the number of arrangements of A(Ri), • • • , A(Rfi) that have p, q in different faces. Note, that 
X(p, q) is equal to the Hamming distance between fnip) and fn(<l), an d it thus the distance 
between the images of p and q in the new space. Clearly, as /i tends to infinity, X(p,q)/[x 
tends to U(p). Using Chernoff inequality, we can quantify the quality of approximation 
provided by fx. Specifically, let z = U(r /m) and Z = U((l + e)r/m); in the following we will 
make sure that Z < 1/2. Then, from the Chernoff bound [|MR95 , |N'1PS98|1 it follows that for 
any a > if fx = for some constant C, then with high probability: 

• if T>l(p, q) < r then X(p, q)/ fx < z{l + a) 

• if V L (p,q) > r(l +e) then X{p,q)/fi > Z(l - a) 

Therefore, the mapping fn converts the distance gap r : (1 + e)r into the gap z(l + a) fx : 
Z(l — a) ix. We next fine tune k (the size of each sample) so that the resulting gap will be 
as large as possible. (Intuitively, the larger the target gap is, the easier it is to detect it in 
later stages.) Therefore, in the following we focus on finding k such that the ratio 



A 



Z{\ — a)n 
z{l + a)n 

is as large as possible. To this end, we observe that 



since 



= u 


f r \ 
\m) 


< 1 - 


- e~ 


- 1 ) 


n 

> 


n I 





1 — 

m 



< 1 



rk/m I 



(rk/m) 



1_ T 



(l-« 2 ) < a 2 + (1 -e- a )(l -a 2 ) < a 2 + a(l - a 2 ) < a(l + 



a) 



n 



1 ) |[MR95|| , k = — , and x > 1 — e x . Furthermore, 



> 



U{(\ + e)r/m) = 1 - (1 - (1 + e)r/m) k > 1 - e 
(1 + e)a - ((1 + e)a) 2 > (1 + e)a{l - (1 + e)a) 



-(l+e)rfc/m 



-(l+e)a 



since (1 — t/n) n < e * [ |MR95|| and 1 — e x > x — x 2 /2 > x 
Therefore 



Z 

— > 

z 



1 + eWl -(!+£)«) 



> (l+ e )(l-(l + e)a)(l-a) > (1 - (2 + e)a), 



since 1/(1 + x) > (1 
(at least) 



a(l + a) 

x). Thus, if we set a to be 1/logn, then the distance gap becomes 



Z{\ — a)n 



> (1 + e)(l - (2 + e)a)(l - a) 2 > (1 + e) 1 - 



r/ 



logn 



where a is an appropriate constant. Also, note that the resulting value of z is 



1 - (1 - r/m) k > 1 



-p k 



l-e- a >a- a 2 12 = fl(l/ logn) 



and fx = (C log n)/(za 2 ) = C\og 2 n/a 2 = 0(\og 4 n). Finally, since m/r > logn, we have 
that k = a(m/r) = (1/ logn) (m/r) > 1 (i.e., the sample size k is at least 1). ■ 
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Lemma 4.3 Given a set P ofn points, and a set L ofm lines, one can compute the function 
fn('), of Lemma for all the points of P in 0((m 2 / 3 n 2 / 3 + m + n)) expected time. 



Proof: We have to compute for each point of P the face that contains it in each of the ar- 
rangements A(Ri), ■ ■ ■ ,A(Rfj,), where /i = 0(log 4 n). Or alternatively, compute all the faces 
of .4.(-Ri), ■ ■ ■ , A{Rfi) that contains points of P. For a single arrangement Ai this can be done 
in 

0(m 2//3 n 2 / 3 log 2//3 (m/y / n) + (m + n) logm) expected time ||AMS98|| . Since there are /i coor- 
dinates (i.e., arrangements), the result follows. ■ 

Thus, we showed how to embed T>l into /i- dimensional Hamming space S' 1 in 0(n + 
m + n 2//3 m 2//3 ) time, mapping a (1 + e) gap between close and far points into a gap of size 
(1 + e)(l — 0(1)/ logra), where /i = 0(log 4 n) and £ C 7L is the set of face labels we use (i.e., 
|E| = 0(m 2 ). By using standard embedding techniques (e.g. see [[KOROOjl ) we can embed the 
Hamming space S M into {0, 1} D with D = 0(fx log |E| log 2 n) = 0(log 6 nlogm), preserving 
the gap up to another factor (1 — 0(1)/ log n). This gives an embedding of T>l into D = 
O (//logm log 2 n)-dimensional binary Hamming cube, with error (1 — 0(l)/logn). Thus it is 
sufficient for us to maintain c-nearest neighbor in {0, 1} D where c = (1 + — 0(1)/ logn), 
which takes Ofa 1 ^) = 0(n 1 ^ 1+£ / 2 - ) ) time per operation |[1M98 |. 

We conclude: 

Theorem 4.4 By performing a 0(n + m + n 2 ^ 3 m 2 ^ 3 )-time preprocessing, one can reduce 
the problem of maintaining dynamic (1 + e)- approximate nearest neighbor for any n-point 
crossing metric overm lines, to the problem of maintaining dynamic (l + e)(l — 0(l)/logn)- 
approximate nearest neighbor in Hamming space with 0(log 7 n) dimensions (assuming m = 
n o{i) j_ \ a n er can b e solved in O^n 1 ^^ 6 ^ 2 ^) time per operation. 



4.1 Embedding of the Crossing Metric over H d 

In this Section, we extend the methods from the previous section to the crossing metric 
defined by (d — l)-dimensional hyperplanes in ]R d , for any fixed d > 2. To this end, it is 
sufficient to design an efficient procedure, which given a set of n points P = pi, . . . ,p n and a 
set of m hyperplanes Ti = {Hi, . . . , H m }, assigns a symbol 6 S C 2 to each p { in such a 
way that ^ o~j iff there exists H^ which separates Pi from pj. Unfortunately, the idea from 
the previous section does not give subquadratic time algorithm for d > 2, since even in d = 3 
the complexity of n cells in an arrangement formed by n planes could be Q(n 2 ). Fortunately, 
for our purpose, we do not need to compute the actual cells containing piS. Rather, it is just 
sufficient to find the labels for those cells, or more specifically, a function h : P —>■ £ such 
that h(p) = h(q) iff p and q belong to the same arrangement cell. 

Abusing notations, we denote by Hk(p) the function returning 1 if p lies on one side of 
Hk and zero otherwise. We use the following hashing function 

K x ) = (^^ a i H i( x )J ) 

where ai . . . a m are independent and identically distributed random variables with uniform 
distribution over {0,...,n c }, where c is a constant to be specified shortly. Note, that if 
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p,q G H d lie in two different full-dimensional faces of A(7~L), then, as noted above, there 
must be a hyperplane H k e TC, so that H k (p) ^ H k (q), and say that H k {p) = 1. That is, 
h(p) = h'(p) + a k and h(q) = h'(q), where h'(x) = J2 i7 L k a>iHi(x). Since the a« were picked 
independently, it follows that h(p) = h(q) only if h'(p) — h'(q) = a k . But the probability of 
that to happen is l/n c . We conclude, that the probability of two points belonging to two 
different faces to be mapped to the same value by h(-) is l/n c . Thus, since we have 0(n 2 ) 
pairs of points to consider in our algorithm, it follows that the probability of the hashing to 
fail is n 2 ~ c which can be made to be arbitrarily small by picking c to be large enough. 

Namely, we associate a weight a { with each half-space induced by a hyperplane Hi. For 
each point pj, we compute the total weight of all the half-spaces that contain it, and all 
the points having the same total weight are associated with the same label. Computing 
the weight of a point pj falls into the class of problems known as intersection-searching 
Aga97|| . In particular, one can construct a data-structure in 0{m l+s ) time, so that one can 



answer intersection-searching queries in 0((n/m 1//<i ) log d+1 n) time, where 5 > is arbitrarily 
small constant. As the algorithm needs to perform a linear number of such queries, we set 
m = n 2d ^ d+1 \ Thus, the algorithm computes the required labels in 0(n 2d ^ d+1 ^ +s ) time. We 
conclude: 

Theorem 4.5 By performing a 0(n 2d ^ d+1 ^ +5 )-time preprocessing, where 5 > is arbitrary 
constant, one can reduce the problem of maintaining dynamic (1 + e)- approximate nearest 
neighbor for any n-point crossing metric over n hyperplanes in IR d , to the problem of main- 
taining dynamic (1 + e)(l — 0(1) / log n)- approximate nearest neighbor in Hamming space 
with 0(log 7 n) dimensions. 



Remark 4.6 Note, that the constants in the bounds of Theorem depend exponentially 
(or worse) on the dimension d. 

Remark 4.7 As indicated in the introduction, having such a embedding, enable one to 
use a large collection of subquadratic approximation algorithms for the intersection metric, 
including dynamic amortized 0(n 4//3 + n 1+1 / c )-time (for d = 2) c-approximation algorithms 
for bichromatic closest pair |[Epp95|| and 0(n 4//3 + n 1+1//c )-time algorithms for: c-approximate 



diameter and discrete minimum enclosing ball ||GIV0 1|| , 0(c)-approximate facility location 



and bottleneck matching [ GrlVOl . Similar (i.e., subquadratic time) results hold for any 
d > 2. 

4.2 Computing an MST Using the Embedding 

We next describe how to use the embedding described in the previous two sections, for 
getting an (1 + ^-approximation algorithm for the MST under crossing metric. Note that 



everything described in this section is well known | IM98|| , and we provide it only for the sake 
of completeness. Also, the resulting algorithm is slower in the planar case than the algorithm 
of Section |3|. 

Computing the minimum spanning tree under the intersection metric, using the Kruskal's 
algorithm, boils down to maintaining the bichromatic nearest-neighbor pair (under the inter- 
section metric) between two sets P\,P2 Q P, under insertions and deletions. A consequence 
of Eppstein result ||Epp95|| is the following: 
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Theorem 4.8 ( [|Epp95 ]) Given a dynamic data-structure for nearest-neighbor queries, where 



each insertion / deletion / query operation takes T{n) time, then one can compute the MST 
in 0(nT(n) log 2 n) time. 

It is easy to verify that if we get a (1 + ^-approximation to the MST if we use an (1 +e)- 
approximate dynamic nearest-neighbor data-structure (Eppstein, personal communication, 
1999). 

Namely, we need a data-structure that support dynamic approximation nearest-neighbor 
queries. After applying the embedding described above, we use the e'-PLEB data-structure 
of [|IM98|1 to maintain a (1 + ^-approximate nearest neighbor in the embedded space. Specif- 



ically, we construct an £-PLEB in the embedded points. In this way, we obtain an £-PLEB 
for our original points (i.e., we embedded a gap to a gap, so that a close point in the em- 
bedded space, corresponds to a close point in the crossing metric) data-structure that for a 
query p return us a point of q G P so that T>l(p, q) < (1 + e)r, if there exits a point q* E P 
so that V L (p, q*) < r. 

Thus, by constructing log 1+e n such data-structures, we can use binary search on those 
data-structures to find and (1 + e)-approximate nearest neighbor to a query point. Namely, 
this data-structure can be used to answer approximate nearest neighbor queries for the 
intersection metric. For the whole scheme to work, we need those data-structures to be 
dynamic; i.e., support insertions and deletions of points. Fortunately, the only part of the 



algorithm that needs to be dynamic is the second stage that uses the data-structure of ||IM98 
which is dynamic. 
We conclude: 



Theorem 4.9 Given a set P of n points in the plane, and a set L of n lines, one can 
compute in 0(n 4//3 + n 1+1 ^ 1+e ^ time, a spanning tree of P of weight < (1 + e)yV op t(P, L). 
The result returned by the algorithm is correct with high probability. For d > 2 dimensions, 
such an MST can be approximated in O (^ n 2d /( d + 1 )+ s _|_ n 1 + 1 /( 1 + £ ) \ time, where 5 > is an 
arbitrary constant. 



5 Conclusions 

We presented the first (l + e)-algorithm for approximating the minimum spanning tree under 
the crossing metric in the plane. We also presented a subquadratic time approximation algo- 
rithms for a variety of other problems, obtained by embedding the crossing metric into higher 
dimensional space. The techniques used in our paper seems to be new to low-dimension 
computational geometry, and we believe that they might be useful for other problems in 
computational geometry. 

There are several interesting open problems for further research: 

• Can the result be extended to other cases: segments or arcs instead of lines? 

• Can a similar approximation algorithm be found for the case of minimum weight tri- 
angulation under the crossing metric? 
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A A Rough Approximation to the Weight of the MST 
in Near Linear Time 

In this appendix, we show how to approximate the weight of the minimum spanning tree up 
to roughly a factor of 0(a(n) logn) if its weight is at least linear. In Section [3|, we presented 
a near linear time algorithm for (1 + e) -approximation for the minimum spanning tree, that 
relies on this approximation algorithm. 

Underlining the approximation algorithm, is the observation that an MST for a random 
sample of the lines of L provides a rough approximation to the weight of the MST of L. If the 
weight of the MST of the sample is near linear, we can approximate it up to a 0(a(n) logn), 
using the following algorithm. 

Lemma A.l Given a set R of r lines, P a set of n points, and W a prescribed parameter, 
one can decide whether W op t(P, R) is large; namely, W op t(P,R) = Q((r + n + W) a (n) log n). 
The algorithm takes 0((r + n + W) a (n) log 2 n) expected time. Furthermore, ifW op t(P,R) < 
W , the algorithm will report that its weight is large with probability at most n~ c , where c is 
an appropriate constant. 



Proof: Use the algorithm of Theorem fLB and execute it O(logn) times on P and R. If 



the running time of the i-th. execution of the algorithm exceeds f2((r + n + W)a{n) logn) 
abort it, and move on to the next execution. If VV op t(P, R) < W , then the algorithm of 
[ HS01|| provides a spanning tree of expected weight 0((r + n + W)a{n) logn) with the same 



bound on the expected running time. Thus, if in O(logn) executions the algorithm returns 
always that W op t is large, we can conclude that with probability > 1 — n~ c the weight of 
Wopt(P, R) is not < W. m 
Lemma [A. 1| shows that we can approximate the weight of the MST in near linear time if 



its weight is near linear. However, if it is heavier, we will use random sampling to keep the 
running time under control. 
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Let R C L be a random sample of lines out of L, where each line is picked independently 
with probability r/n. Clearly, the probability of an intersection point u (between a connected 
set 7 and a line of L), to be present in A{R) is r/n (this is the probability that the line of 
L passing through u will be chosen to be in the random sample). 

Definition A. 2 For a curve 7, and a set of lines L, let weight (7, L) denote the weight of 7 
in the arrangement A{L). This is the number of intersections of 7 with the lines of L. 

Lemma A. 3 Let R be a sample of lines of L (chosen as described above), then with high 
probability: 

n 

W opt (P, L) < -{c n logn + 2W opt (P, R)) , 

and with probability > 0.9 we have ^ ■ Y^^s^i^l < W op t(P, L), where cq is an appropriately 
large constant. 

Proof: Let T ^ t = T opt (P, L), and let W R = weight (7^, R) be the weight of T^ t under the 
crossing metric of R. Clearly, 22 [Wr] = W opt (P, L)-. Thus, we know that with probability 
> 0.9 we have Wr < 10W op t(P, L)^ (by Markov inequality), and with probability > 0.9, we 

have that Wj t = W opt (P, R)<W R < 10W opt (P, L)—. 

Let p, q G P be two points, and let X pq be the distance between p, q in the arrangement 
A(R)- If the distance between p,q is large, that is U = T> L (p,q) > c (n/r) logn (where c 
is a large enough constant), then one can show using Chernoff inequality, that with high 
probability, we have: 

< X pq - < 2U. 
2 r 

On the other hand, by the above argument, each edge e = pq of T^ t = T opt (P, R) either 
intersects at most c (n/r) logn lines of L, or alternatively, the number of lines of L intersected 
by e is smaller than 2(n/r)X e , where X e is the number of lines of R that e intersects. Thus, 
with high probability, we have 

W op4 (P,L) < weighty, L)= V L(p,q)< { c \ lo ^ n + 2X ^ 

n 2 logn , n n ,2n 

= c o — ^ + W op t(P,R)— ■ 



Remark A. 4 We can make both probabilities in Lemma |A.3| large by repeating the exper- 
iment O(logn) times, and picking the smallest W(P,R) computed. With high probability, 
we have 

- ■ R) < W op t(P, L) < -(c n log n + 2W opt (P, R)) . 

r 10 r 

In particular, if W opt (P, R) > c nlogn, we get that 3W opt (P, R)j is a constant factor ap- 
proximation to W opt (P, L). 
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Lemma A. 5 Let r be a prescribed parameter, and W op t = W op t{P, L) . Then, an algorithm 
can decide whether 



• W opt is small - namely W opt < 10c ° n r logn . 

• Wopt is large - W opt = Vt(^a{n) log 2 n). 

• Wopt is in between. Any of the two above answers are valid. 

The algorithm takes 0(na(n) log 4 n) time, and returns a correct result with high probability. 

Proof: We pick m = O(logn) samples R±, . . . , R m by picking each line with probability 
r/n into the sample. For each sample, we check whether W op t(P, Rj) < lOconlogn, using 
the algorithm of Lemma |A.1| . This will require 0(na(n) log 3 (n)) time for each sample, and 



0(na(n) log 4 (n)) overall. 

If the algorithm of Lemma [A.l| returned not large for any sample R, we know that 



W op t(P, R) = 0(na(n) log n). And by Lemma |A3|, we know that W opt (P, L) = O 



n a(n) log n 



with high probability. 

Now, we can perform a binary search to approximate the weight of W op t{P, L). 

Lemma A. 6 One can compute in 0(na(n) log 5 n) time a value M, so that 

W 0P t(P, L)<M = 0{na{n) log 2 n + W opi (P, L)a{n) logn). 



Proof: Use Lemma A. 5 , set r = n. In the z-th iteration check whether W op t = 

fl(^y-a(n) log 2 nj, by using the algorithm of Lemma A.5 . If it is, we set r^+i = and 

repeat the process. We stop as soon as this check fails. Then, we know that with high 
probability 

10C ° n2lQg " < W opt (P,L) =0 hMn)\oin\ = 

Ti-l \ r i J 

implying that M is the required approximation. ■ 



Remark A. 7 Note, that if algorithm of Lemma [A.6| stops after the first iteration, then 
W op t = 0(na(n) log 2 n). In such a case the approximation we get is much worse then 
logarithmic. However, this is to some extent the easiest case: Without any sampling we get 
a spanning tree of near linear (or sub linear) weight. 
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